4 research outputs found

    On the use of integer programming to pursue optimal microaggregation

    Get PDF
    CNR-IASIThis document reports a research collaboration in CNR-IASI (Italy) until the 7th of January. Microaggregation is a method for perturbing data in order to avoid individual identification in microdata. In terms of optimization, it is a clustering problem which consists in joining individuals in clusters with a minimal size such that the total spread is minimized. For multivariate data, the problem is NP-Hard and there is no procedure guaranteeing optimality. This document reports the state of the art in this topic on heuristic clustering algorithms and Integer Programming. Besides, inspired by the use of Column Generation in an approximate model, the document proposes a scheme to solve microaggregation with optimality. The block of Column Generation has been deeply developed in polyhedral aspects for the Pricing Problem. A code of this first block has also been implemented with CPLEX and its results are reported too. At the current stage, the procedure achieves optimality in certain instances of data and, in any case, finds a lower bound on the spread in microaggregation. Those results are new contributions and encourage us to follow this line of research

    An optimization-based decomposition heuristic for the microaggregation problem

    Get PDF
    Given a set of points, the microaggregation problem aims to find a clustering with a minimum sum of squared errors (SSE), where the cardinality of each cluster is greater than or equal to k. Points in the cluster are replaced by the cluster centroid, thus satisfying k-anonymity. Microaggregation is considered one of the most effective techniques for numerical microdata protection. Traditionally, non-optimal solutions to the microaggregation problem are obtained by heuristic approaches. Recently, the authors of this paper presented a mixed integer linear optimization (MILO) approach based on column generation for computing tight solutions and lower bounds to the microaggregation problem. However, MILO can be computationally expensive for large datasets. In this work we present a new heuristic that combines three blocks: (1) a decomposition of the dataset into subsets, (2) the MILO column generation algorithm applied to each dataset in order to obtain a valid microaggregation, and (3) a local search improvement algorithm to get the final clustering. Preliminary computational results show that this approach was able to provide (and even improve upon) some of the best solutions (i.e., of smallest SSE) reported in the literature for the Tarragona and Census datasets, and k¿{3,5,10} .Peer ReviewedPostprint (author's final draft

    On the use of integer programming to pursue optimal microaggregation

    No full text
    CNR-IASIThis document reports a research collaboration in CNR-IASI (Italy) until the 7th of January. Microaggregation is a method for perturbing data in order to avoid individual identification in microdata. In terms of optimization, it is a clustering problem which consists in joining individuals in clusters with a minimal size such that the total spread is minimized. For multivariate data, the problem is NP-Hard and there is no procedure guaranteeing optimality. This document reports the state of the art in this topic on heuristic clustering algorithms and Integer Programming. Besides, inspired by the use of Column Generation in an approximate model, the document proposes a scheme to solve microaggregation with optimality. The block of Column Generation has been deeply developed in polyhedral aspects for the Pricing Problem. A code of this first block has also been implemented with CPLEX and its results are reported too. At the current stage, the procedure achieves optimality in certain instances of data and, in any case, finds a lower bound on the spread in microaggregation. Those results are new contributions and encourage us to follow this line of research

    On the use of integer programming to pursue optimal microaggregation

    No full text
    CNR-IASIThis document reports a research collaboration in CNR-IASI (Italy) until the 7th of January. Microaggregation is a method for perturbing data in order to avoid individual identification in microdata. In terms of optimization, it is a clustering problem which consists in joining individuals in clusters with a minimal size such that the total spread is minimized. For multivariate data, the problem is NP-Hard and there is no procedure guaranteeing optimality. This document reports the state of the art in this topic on heuristic clustering algorithms and Integer Programming. Besides, inspired by the use of Column Generation in an approximate model, the document proposes a scheme to solve microaggregation with optimality. The block of Column Generation has been deeply developed in polyhedral aspects for the Pricing Problem. A code of this first block has also been implemented with CPLEX and its results are reported too. At the current stage, the procedure achieves optimality in certain instances of data and, in any case, finds a lower bound on the spread in microaggregation. Those results are new contributions and encourage us to follow this line of research
    corecore